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Projective measurements of a single two-level quantum mechanical system (a qubit) evolving under 
a time-independent Hamiltonian produce a probability distribution that is periodic in the evolution 
time. The period of this distribution is an important parameter in the Hamiltonian. Here, we 
explore how to design experiments so as to minimize error in the estimation of this parameter. 
While it has been shown that useful results may be obtained by minimizing the risk incurred by 
each experiment, such an approach is computationally intractable in general. Here, we motivate 
and derive heuristic strategies for experiment design that enjoy the same exponential scaling as fully 
optimized strategies. We then discuss generalizations to the case of finite relaxation times, T2 < 00. 



Introduction. Measurement adaptive tomography has 
recently been suggested as an efficient means of perform- 
ing partial quantum process tomography [51 111) . Little 
is known about optimal protocols when realistic exper- 
imental restrictions are imposed — as opposed to the 
case where one is allowed arbitrary quantum resources^. 
Indeed, even in the simplest examples, not even bounds 
have been given on the proposed protocols. Here, we give 
analytic bounds on both non-adaptive and adaptive es- 
timation protocols for a Hamiltonian parameter estima- 
tion problem. Moreover, we derive estimation protocols 
which asymptotically achieve these bounds. Adaptive 
protocols are typically difficult to implement because a 
complex optimization problem must be solved after each 
measurement. We instead derive a heuristic that is easy 
to implement and achieves the exponentially improved 
asymptotic risk scaling of the optimal solution. 

Within the nuclear magnetic resonance (NMR) com- 
munity, similar concerns have motivated the examination 
of the use of maximum entropy and maximum likeli- 
hood [3] methods for obtaining spectra. Recently, com- 
putational power has become available such as to make 
these methods feasible for use in analyzing non-uniform 
data obtained from high-dimensional NMR experiments 
[8]. These studies have produced qualitatively similar 
strategies for how to best design experiments when each 
sample is expensive to collect. 

The paper is organized as follows. First, we define 
the model Hamiltonian which we want to estimate the 
parameters of, along with our metric of success. Then 
we give both frequentist and Bayesian lower bounds on 
the risk derived from this metric. Finally, we derive 
strategies which achieve the asymptotic scaling of these 



bounds. 

Problem statement. The model we consider is a qubit 
evolving under the Hamiltonian 



Here uj is the unknown parameter whose value we want 
to ascertain. We make the problem dimensionless by as- 
suming w g (0, 1). An experiment consists of preparing a 
single known input state |-|-), evolving under the Hamil- 
tonian H for a controllable time t and performing a mea- 
surement in the ax basis. We emphasize here that we are 
assuming strong projective measurements on individual 
copies of a quantum preparation, rather than weak mea- 
surements on physical ensembles such as those studied in 
NMR experiments. 

The outcomes of the measurement we label d G {0, 1}, 
where and 1 refer to |-|-) and |— ), respectively. An 
experiment design consists of a specification of the time 
t that we evolve a qubit under H before we measure. 
The likelihood function for a given experiment t is then 
given by the Born rule Pr(0|w,i) = | | and 
Pr(l|a;,i) = l — Pr{0\oj,t). Using our model Hamiltonian, 
we can express the likelihood more simply as: 

Pr(d|u;,i) = sin2(|i)''cos2(|t)' ^ (1) 

Note that this model does not include noise. Below, we 
somewhat generalize this model by including limited vis- 
ibility and a T2 dephasing process. 

If we desire an estimate lj of the true value uj, a com- 
monly used figure of merit is the squared error loss: 

L(uj, Co) = \uj — Lj\^ . 



As in the standard phase estimation protocol. See e.g. [2]. 



The risk of an estimator, which is a function that takes 
data sets {D,T) := {{dk}, {tk}) to estimates uj{D,T), is 
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its expected performance with respect to the loss func- 
tion: 

R{uj, u})^Yl Prplw, T)L{u}, cb{D, T)). 

D 

For squared error loss, the risk is also called the mean 
squared error (MSB). 

Mean squared error lower bound. The difficulty here 
is that the random outcomes of the measurements are 
not identically distributed. In fact, since they depend on 
the measurement time, each one could be different. Al- 
though, asymptotic results exist for non-identically dis- 
tributed random variables^, these results are derived for 
insufficient statistics, such as the sample mean. More- 
over, we desire to provide computationally tractable 
heuristics that permit useful estimates with a finite num- 
ber of samples. 

Although it is quite difficult to obtain exact expressions 
for the risk for arbitrary measurement times, in some 
cases we have obtained an asymptotically tight lower 
bound. For unbiased estimators, we can appeal to the 
Cramer-Rao bound [1] 

i?(w,u;)>-^, (2) 

where 

^i^) ^ - 2^P<D\uj,T) — ^ (3) 

D 

is called the Fisher information. In our particular case, 
the Fisher information reduces to quite a simple form in 

N 
k=l 

which is conveniently independent of uj (a derivation is 
given in Appendix A). Thus, the mean squared error is 
lower bounded by 

^(->-)>^J^- (5) 

Later we show that this bound becomes exponentially 
suppressed when we include noise in our model. In gen- 
eral, this quantity is dependent on the true parameter 

OJ. 

The Bayesian solution considers the average of the risk, 
called the Bayes risk, with respect to some prior 7r(a;): 

r{n,oj) — / R{ijj,ijj)'K{uj)duj. 



^ The frequentist reference is [7|, while a useful Bayesian reference 
is [12]. 



As in references [SJ [TT], we choose a uniform prior for 
Lo £ (0, 1). Then, the final figure of merit is the average 
mean squared error: 

r{u!) — J R{uj,uj)diiJ. 

The goal is to find a strategy which minimizes this quan- 
tity. Although there exist Bayesian generalizations of the 
Cramer-Rao bound |^ , ours is independent of uj and thus 
remains unchanged by integrating equation ([s]) over the 
parameter space: 

r{uj) > (6) 

Note also that, in general, Bayesian Cramer-Rao bounds 
require fewer assumptions to derive than the standard 
(frequentist) bound. Although they are the same for this 
model, they differ for a more general model considered 
later. In broad strokes, the difference in practice between 
Bayesian and frequentist methods is averaging versus op- 
timization. Below we demonstrate a heuristic strategy 
which draws from both methods to achieve the goal of 
determining the measurement times which give the low- 
est possible achievable bound on the Bayes risk 

Looseness of the Cramer-Rao bound. As useful as the 
Bayesian Cramer-Rao lower bound (|6| is, it is simple to 
see that it is not always achievable. We can obtain a 
lower bound by considering the best protocol we could 
possibly hope for in any two-outcome experiment. In 
such a protocol, one bit of experimental data provides 
exactly one bit of certainty about the parameter uj. If 
we learn the bits of uj in sequence, at each step fc, our 
risk is upper bounded by the worst-case where all the 
remaining bits of uj are either all or all 1. In either 
case, the error incurred by estimating a point between the 
two extremes is given by 'J2'^=k+2 ~ 2~('^+^\ leading 
to the best possible MSE after N measurements being 
2-2(w+i)^ even though we can make a smaller Cramer- 
Rao bound by choosing times that grow faster than this 
exponential function. Note that this risk is achievable 
via the standard phase estimation protocol [2], but that 
this protocol requires quantum resources which are not 
part of our model. 

Examples. Let us consider a couple of examples for 
which the lower bound can be further simplified. First, 
consider the case when all the measurement times are 
the same. This is by far the simplest case, since the out- 
comes become identically distributed. Recall w S (0, 1). 
Then, the measurement time should be less then the first 
Nyquist time, t < tt, or the data will be consistent with 
more than one uj. That is, for t > tt (but less than 2tt, 
say), the likelihood function will have two equally likely 
maxima. We minimize the risk, then, by choosing t — ir. 
Then, the maximum likelihood estimator (MLE), for ex- 
ample, will be asymptotically efficient [9 achieving the 
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Cramer-Rao lower bound 



?'(wmle) 



O 



Now consider a uniform grid of times. Since lo £ (0, 1), 
we should choose the Nyquist sampling rate: tk — kn. 
Then, for any estimator uj using data collected at these 
measurement times, the Cramer-Rao bound gives 



r(w) > 
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O 



1 



Again, the maximum likelihood estimator will be asymp- 
totically efficient. However, since the likelihood function 
will have many local maxima, the maximum likelihood 
estimator is non-trivial to find as gradient methods are 
not guaranteed to work. Bayesian estimators were de- 
rived in |TT] , where simulations yielded ~ 1 /N^ risk scal- 
ing which is asymptotically efficient. 

Note that since we are considering a uniform spacing of 
times, we can apply a Fourier estimation technique with- 
out worrying about spectral aliasing introduced by non- 
uniformity [lOj . That is, we apply the discrete Fourier 
transform and estimate the peak of the power spectrum. 
Since the resolution in the frequency domain is 1/NAt, 
we expect the Bayes risk to be 

1 

^■(l^Fouricr) 



The sampling theorem requires that we sample from a 
deterministic function, not a probability distribution. In 
practice, this condition is often approximately satisfied 
by sampling some stable statistic such as the mean value 
of the distribution at each time. This can be achieved by 
measuring at the same time until a sufficiently accurate 
estimate of the mean at that time is obtained, then re- 
peating this for many other times. But as we have shown, 
this method can be quadratically improved by perform- 
ing every single measurement at a different time. 

Exponentially achievable lower bound. It has been 
shown that Bayesian adaptive solutions lead to risk de- 
creasing exponentially with the number of measurements 
[llj . However, these results are given by fits to numerical 
data. Here, we give an analytic lower bound on the risk 
of these protocols. 

The local (in time) Bayesian adaptive protocol can be 
described as follows: (1) begin with a uniform prior Pr((u;) 
and determine the first measurement time ti w I.ISGtt 
which minimizes the average (over the two possible out- 
comes) variance of the posterior distribution; (2) perform 
a measurement at ti, record the outcome di, and update 
the distribution Pr(w) i-7> Pi{oj\di,ti) via Bayes' rule; (3) 
repeat step (1) replacing the current prior with the cur- 
rent posterior. Note that the expected variance in the 
posterior is the Bayes risk. Thus, the protocol attempts 
to minimize the risk assuming the next measurement is 



the last. Strategies that are local in this sense are called a 
greedy strategies, as opposed to strategies which attempt 
to minimize the risk over all future experiments. 

For some choices of measurement times, including 
those given by the protocol above, the posterior will be 
approximately normally distributed^ . This is guaranteed 
in the asymptotic limit, but the posterior distribution 
near its peak is also remarkably well approximated by a 
Gaussian after as few as 15 reasonably chosen measure- 
ments (we found a uniform grid t^ = kir to be sufficient 
for "warming up" to the Gaussian approximation) . Thus, 
we approximate the current distribution (at given some 
sufficiently long measurement record D) as 



Pt{uj\D) 



1 



with some arbitrary mean /i and variance implied by 
D. The expected posterior variance (which is equal to 
the Bayes risk) of the probability distribution of the next 
measurement is 



r{t) 



1 



t^a^ sinipty 



(7) 



(derived in Appendix B) which oscillates with frequency 
2/z within an envelope cr^ ^1 — t^cr^e"* ''^ . Asymptoti- 
cally, the minimum risk will approach the minimum of 
the envelope for all /i, but will be a lower bound on the 
risk otherwise. This minimum occurs at t = - with a risk 

a 

of r{t) = (1 — e^^)(T2, which is also the variance of the 
updated probability distribution since both outcomes are 
equally probable at t. Thus, at each measurement step 
we reduce the risk by l-e^^ w 0.632 w g-^ ''^^ « 2-°-^'^^. 
Thus, the risk scales exponentially as r ~ o'^(l 
and is achieved at measurement times which scale as 



tk 



1.26'' 



^(1 _e-i)fe/2 



These times are guaranteed to be optimal only in the 
asymptotic limit. For finite numbers of samples, we sug- 
gest two simple heuristics. First, we suggest the use of 
exponentially increasing times, where the base of the ex- 
ponent is optimized offline, followed by the use of the 
maximum likelihood estimator for these times. Second, 
we suggest a simpler adaptive scheme based on the as- 
sumption that the distribution remains Gaussian after 
each measurement. Making use of this normality as- 
sumption, we only need update equations for the mean 
and variance of the distribution over lo. In deriving the 
update equations, we also take into account the oscilla- 
tions of the expected Bayes risk by finding the nearest 



^ This is true asymptotically and higher order corrections can be 
used if required I12| . 
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FIG. 1: The Bayes risk - the average (over a uniform prior) 
mean (over data) squared error - of the strategies discussed 
in the paper. Data points are at evenly spaced measurement 
numbers A'' G {16, 20, 24, . . . , 124} and the hues are linear 
interpolants to guide the eye. Each data point is the av- 
erage of 10* simulations. In each figure, the noise param- 
eter 77 = 1 since its inclusion only gives a constant offset. 
From top to bottom, the relaxation characteristic time is 
T2 = 00, lO^^TT, 10*7r. The thin solid lines indicate the lower 
bound given by Equation (10 1. 



achievable minima to the one given by the lower bound. 
We provide the update equations in Appendix C. 

Generalization to finite T2. In practice, we will have to 
consider not only experimental restrictions but also noise 
and relaxation processes. Processes which do not affect 
the quantum state can be effectively modeled by ran- 
dom bit-flip errors occurring with probability l — rj. Pro- 
cesses which do affect the quantum state (decoherence) 



are modeled by an exponential decay of phase coherence^ 
with characteristic time T2. Since the state being mea- 
sured lies in the xy-plane of the Bloch sphere, this loss 
of phase coherence manifests as an exponential decaying 
envelope being applied to the original likelihood ([T]) . The 
model is thus fully specified by the likelihood function 



Pr(0|^,t,?7,T2 



l-erT^\ l-?7 (8) 



The Cramer-Rao bound is now given by 



N 



tlri^ sin- (wife) 



e ^2 - if coB^ [ujt]^) , 



(9) 



Note that unlike the Cramer- Rao bound ([s]) for the noise- 
less case, the above bound is not independent of w and 
thus we must appeal to the Bayesian Cramer-Rao bound 
so that the measurement times can be chosen indepen- 
dently of the true parameter. However, the Bayesian 
bound turns out to be very loose. A sharper bound is 
given by first upper bounding each term in the denomi- 
nator to give 
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k=l 



tie ^2 



The noise term (or visibility) 77 simply gives a constant 
reduction in the achievable accuracy. The relaxation 
process provides a more interesting dynamic as we see 
that the gains from longer times are exponentially sup- 
pressed. In other words, strategies are restricted to ex- 
plore tk <T2- We can thus do no better than 



r(w) > 



(10) 



The adaptive strategy discussed above can be gener- 
alized to include noise and relaxation but the expres- 
sions are more lengthy (see Appendix B). To illustrate 
the performance of our adaptive strategy, we simulate 
the adaptive strategy along with offline strategies using 
identical times {t^ = tt), linearly spaced times {t^ = kn) 
and exponentially sparse times {tk = {9/8)''). For each 
strategy, we perform simulations for experiments consist- 
ing of different numbers of samples N, up to = 124, 
and repeat each such simulation 10"* to obtain an esti- 
mate of the Bayes risk for that strategy and experiment 
size. In Fig. [T] we present the results of these simulations 



^ We do not include amplitude damping in our model since our 
populations remain equal throughout evolution and thus Ti only 
manifests as a contribution to T2. 
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for the noiseless case, and for the cases T2 = IO^^tt and 

T2 lO^TT. 

Note that in all cases, the adaptive strategy achieves 
exponential scaling until the times selected reach t — 
T2. At that point, the risk will then scale linearly if 
the remaining measurement times are t = T2. However, 
if the protocol continues to select larger measurement 
times, the information gained from those measurements 
will tend to zero and the risk will remain constant. 

Summary and conclusions. By using the Cramer-Rao 
bound along with analytic expressions for the variance of 
each posterior distribution, we have motivated a heuris- 
tic method for choosing experiment designs that asymp- 
totically admits exponentially small error scaling in the 
number of measurements. For finite measurements, we 



have relied on numerical simulation to demonstrate that 
this scaling is well-achieved even for N ^ 120. Numerical 
simulations for finite T2, moreover, have suggested that 
we can enjoy exponential scaling of the risk until the mea- 
surement times saturate the T2 bound, at which point the 
risk scaling switches to the asymptotic scaling of 1/iV. 
In both cases, the heuristics used to design experiments 
are quite computationally tractable, thus motivating the 
utility of our heuristics to actual experimental practice. 
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Appendix A: Derivation of Cramer-Rao Bounds 

In this Appendix, we show that for the simple model represented by the likelihood function presented in equation 
Q, the Fisher information given by ([s]) reduces to the form claimed in Q. To show this, we first note that the 
likelihood for a vector D — (di, d2, . . . ,d]^) of observations at times T — (ii, • • • , ^fc) is given by a product of the 
likelihoods for each individual measurement, 

Pr(D|c^,r) = []Pr(dfe|L^,tfc). 

k 

Thus, the log-likelihood function is simply a sum over the individual log-likelihoods. Since the derivative operator 
commutes with summation, we obtain that 

^ \ogVT{D\u:, T) = Y.^ logPr(4|w, i,.). 

k 

This in turn implies that the Fisher information for a vector of measurements is given by the sum for each measurement 
of that measurement's Fisher information. 

To calculate the single-measurement Fisher information, we find the second derivative of the log-likelihood for a 
single measurement is given by 

d\ p . , I , . ^2 (24-1) (1^24+ cos Mfc)) 
7— r logPrlrffc w,tfe) = ti. q — . 

^ ' ^ ' ((24-l)cosMfc)-l)' 
Thus, we find that the single-measurement Fisher information is given by 

I{u!\tk) = - ^ Pr(rffe|w,tfc)^--^logPr(dfe|w,ife) 
d^e{Q,i} 

^ 2 (24 - 1) (1 - 2dk + cos jujtk)) 

~ ^ 2^ 2(2dfe - l)cosMfe) - 2 

— ^k- 

We conclude that I(w|T) = t^., as claimed. 

For the model with finite T2 and limited visibility, given by the likelihood function ([s]), we can follow the same 
logic. We find the second derivative of ([8| with respect to cj gives us 

q2 (24-l)(?7(l-24) + e'^cos(cjtfe)) 

^logPr(4|w,tfc) ^^t\ ^^^"2 

\r\ (1 - 24) cos (wife) -f e'^ 
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The expected value of this derivative then gives us the Fisher information for a single measurement in the finite-T2 
model, 

r(, M \ ??'^fc sin' {ujtk) 

— T]"^ cos"^ (ojtk) 

Taking the sum of this information then produces the Cramer- Rao bound given in ([9]) . 



Appendix B: Asymptotic Scaling of the Bayes Risk 

In this Appendix, we derive expressions for posterior distributions under the assumption of a normally-distributed 
prior, and then apply these expressions to show the asymptotic scaling of the Bayes risk. We also derive update rules 
that allow for expedient implementation of the greedy algorithm described in the main text. 

Under the assumption of a normally-distributed prior, all prior information about the parameter w can be charac- 
terized by the mean fj, and variance of the prior distribution. Thus, we shall write our priors as Pr(w|/i, a^) to reflect 
the assumption of normality. Then, the probability of obtaining a datum d at time t given such prior information is 
then given by 

Pr(d|t; ^, a^) ^ H Pr{d\t,oj) Pr(w|^, a^)doj = j h - {2d - 1) (l + e^'^*) e-3*(-'*+2v)^ 

Applying Bayes' rule then produces the posterior distribution 

Pr(a;|/i,cr2)Pr(d|t,w) 



PT{uj\d,t; fi, a ) 



Pr(d|t;Ai,(T2) 

e-^^^^{{l - 2d) cos(iw) + 1) 



cr (^2 - {2d - 1) (1 + e2'Mt) e-3*('^'*+2v) 
The mean and variance of this distribution are given by: 

2 (^(2d - l)e- 5'^'*' (cr^t sin(^t) - fi cos(^t)) + fi^ 



21 



21 _ ,,2 I 2 



W[uj\d,t; fi,a^] = fi 



a — 



2-(2(i-l)(l + e2vt)e-5*('^'*+2v) 

2 (^(2(i - l)e- s*^'*' (a'^t sm{fit) - fi cos(/it)) -|- fj^ 
2 -{2d- 1) (1 + e2vt) e-5*('^"*+2v) 
2{2d - 1)0-2^6*''* {aHcos{iJ,t) + 2/zsin(/ut)) 
{2d - 1) (1 + e2v*) - 2e5*('^'*+2v) 

To chose optimal times, we wish to pick t so as to minimize the expected value over of the variance, where this 
expectation is taken over possible data. Based on the previous expressions, we find that 

Ed[¥^[uj\d,t-fi,a^]]=a^ [1 ' ^'^ ' 



e* + cos(/Lti)^ 

in agreement with Equation ([?]). 

This expected variance, which describes our risk incurred by measuring at a given t, is bounded below by an 

envelope E{t,a^) — a"^ (l-t2a2e-*'"'). A pair of examples of the envelope E{t, cr^) and achievable risk r(i; /i, cr^) is 

illustrated in Figure [2j 

Note that the envelope is minimized hy t — argmin^ ct2) — 1/a. Moreover, the expected variance saturates 
the lower bound at intervals in t of l/fi, but the width of the envelope's minimum grows as 1/(72, ^^^^^^ j^gre 
measurements are performed, the bound becomes a good approximation for the minimum achievable risk. Thus, in 
the asymptotic limit of large numbers of experiments, we have that the risk at scales with each step as the minimum 
of the envelope, 

^ \ ' =l-e-^ K 0.632. 



Risk 
0.0010 





FIG. 2: The risk envelope E{t,a'^), and the risk r{t;^,a^) > E{t,a^) for the examples where fi = 0.4 and = 10"^ (left) 
and — 5 X 10~^ (right). Note that as shrinks, there intersections between E and r (marked by dots) become more tightly 
packed. 



We conclude that in the asymptotic limit, the risk decays as e 
measurements performed. 



AT In 0.632 ^ „-0.458JV 



where N is the number of 



Appendix C: Update Equations for /i, a 



In this Appendix, we state without derivation the update rules for /i and after obtaining a measurement result 
d from an experiment performed at time t, under the assumption of an normal prior. For the simple model described 
by Equation ([!]), 



E[oj\d] =1.1- 



7r(2d - l)cr2(-l)'= {2k - 1) exp 



7r"g"(l-2fc)^ 



2/i 



Y[oj\d] = 0-2 - 



n\l - 2dfa^ (1 - 2kf exp (-^^ 



-2kY 



(11) 



(12) 



where k = round + i] is used to pick the intersection of E{t, ct^) and r{t; ii, cr^) to the minimum of E, as described 
in Appendix B. 

For the finite- r2 model, the updated mean and variance are given by 



7r(2d - l)(-l)'''(2fc - 1)0-2 gj^p 



E[u!\d] = ^ 



( 7r - 2 TT fe ) ( - 2 IT fe (T ^ T2 + 4 + TTCT ^ T2 J 



2^1 



7r2(2d- 1)2 (2A:- 1)2(74 exp - 



W[Lu\d] = 



(7r-27rfe)(-27rA:o-^T2+4/i+7r(T^T2) 



4m^T2 



where in this case, 



k = round 



- V4(72T2 + 1 + TTCr^Ts 



(13) 
(14) 
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